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&■*'. Abstract 

This paper addresses the following general scenario: A scientist wishes to perform a battery 
^vq ■ of experiments, each generating a sequential stream of data, to investigate some phenomenon. 

The scientist would like to control the overall error rate in order to draw statistically-valid 
conclusions from each experiment, while being as efficient as possible. The between-stream 
data may differ in distribution and dimension but also may be highly correlated, even du- 
plicated exactly in some cases. Treating each experiment as a hypothesis test and adopting 
the familywise error rate (FWER) metric, we give a procedure that sequentially tests each 
hypothesis while controlling both the type I and II FWERs regardless of the between-stream 
correlation, and only requires arbitrary sequential test statistics that control the error rates for 
a given stream in isolation. The proposed procedure, which we call the sequential Holm pro- 
cedure because of its inspiration from Holm's (1979) seminal fixed-sample procedure, shows 
simultaneous savings in expected sample size and less conservative error control relative to 
fixed sample, sequential Bonferroni, and other recently proposed sequential procedures in a 
simulation study. 

°. 

1 Introduction 

This paper addresses the following scenario: A scientist wishes to perform a battery of k > 2 
experiments sequentially in time in order to investigate some phenomenon, resulting in k data 
streams: 



Data stream 1 X\ , X% , . . . from Experiment 1 

(2) (2) 

Data stream 2 X\ ,X^ , • • • from Experiment 2 (1) 



Data stream k X^ ,X^ , • • • from Experiment k. 

The scientist would like to control the overall error rate of the battery of experiments in order 
to be able to draw statistically-valid conclusions for each experiment once all experimentation 
has ceased, but also needs to be as efficient as possible with the finite resources available by 
"dropping" certain experiments (i.e., stopping experimentation) when additional data is no 
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longer needed from that stream to reach a conclusion. The between-stream data may be very 
dissimilar in distribution and dimension, but at the same time may be highly correlated, or 
even duplicated exactly in some cases, since they all are related to some phenomenon. 

The preceding scenario o ccurs in a number of real app lications including multiple endpoint 
(or multi-arm) cl inical trials (Jennisqn and TurnbullLl2000l . Chapter 15), multi-cha nnel change - 
point detection ( Tartakovskv et a l.. 2003 ) and its applic ations to biosurveillance ( Meil . |2010| ). 



genetics and genomic s (JDudoit and van der Laanl . 12008 ), acceptance sampling w ith multiple 



criteria (JBaillid . [1987]) , and financial trading strategies ( 



Romano and Wola . I2005J ) . If we think 



of each experiment as a hypothesis test about that corresponding data stream, then what is 
needed is a combination of a multiple hypothesis test and a sequential hypothesis test. We 
point out that our use of the word "sequential" here and below refers to the manner of sam- 
pling (or equivalently, observation) and differs from the way the word is sometimes used in 
the literature on fixed-sample multiple testing procedures to describe the stepwise analysis of 
fixed-sample test statistics, e.g., p-values. 

This scenario described above was addressed by iBartroff and Lail ( 20101 ) who gave a pro- 
cedure that sequentially t ests k hypotheses while controlling the type I familywise error rate 
(JHochberg and Tamhand . ll987l ). i.e., the probability of rejecting any true hypotheses, at a pre- 
scribed level. Their procedure requires only the existence of basic sequential tests for each data 
stream and makes no assumptions about the dependence between the different data streams; 
in particular, the error control holds when the streams are highly positively correlated, as 
is often the case in the application areas mentioned above. The current paper introduces a 
procedure to test k hypotheses while simultaneously controlling both the type I and II family- 
wise error rates (defined precisely below) at prescribed levels in the same general setting: No 
assumptions are made about the dependence between the different data stream s. We call this 
new procedure the sequential Holm procedure because of its relation to Holm's (|1979l ) seminal 
fixed-sample "step-down" procedure which controls familywise error rate. Following a review 
of relevant previous work, we give a general formulation of the sequential Holm procedure in 
Section [2l in Section [3] we consider simple hypotheses and then discuss composite hypotheses, 
simulation studies in Section [U and finally a discussion of future extensions and a summary. 



1.1 Background and Previous Work 

Separately, multiple testing and sequential testing are both quite m a ture fi e lds, the forme r 
dating back to classic al "multiple compari son" procedures of iFisherl (1932J), IScheffel ( 19531 ). 
Tukey, and others (see lSeber and Led . l2003l ) for testing hypotheses about parameter vectors in 
linear models. Work on sequential hypothesis t esting dates back to Wald's (J1947I ) invention of 
sequential analysis following World War II (see ISiegmundl ( 19851 ) for a summary of the major 
developments). However, the intersection of these two areas is less well-developed in a general 
setting. One area that has been considered is the adaptation of some classical fixed-sample tests 
about vector parameters, such a s thos e mentioned above, to the sequential sampling setting, 
i ncluding O 'Brien and Fleming's ( 19791 ) sequential vers ion of Pearson's \ 2 test, and Tang et al.'s 
(J1989l ; ll993l ) group sequential exte nsions of O'Brien's (1984) gen eralized least squares statistic. 
For bivariate normal populations. Uennison and Turnbulll ( 19931 ) propose d a sequential test o f 
two one-sided hypotheses about the bivariate mean vector, and ICook and Farewelll ( 19941 ) 
proposed a sequential test in a similar setting but where one o f the hypotheses is two-sided. 
A procedu re for comparing three treatments was proposed by ISiegmundl ()1993l ) , related to 
Pauls on's (119641) earlie r proc edure for selecting the largest mean of k normal distributions, 
which IBartroff and Lail (120101 ) showed to be a special case of their more general sequential step- 
down method, mentioned in the previous paragraph. All of these procedures aim to explicitly 



control either the classical type I error probability or the type I familywise error rate. The 
first sequential procedure s to simultaneously co ntrol both the type I and II familywise error 
rates were introduced by iDe and Baronl ( 2012al Jbl). however these procedures are constrained 



to continue sampling all data steams until accept/reject decisions can be reached for all data 
steams, making their form and performance quite different than the sequential Holm procedure 
proposed here. 

2 General Formulation 
2.1 Notation and Set-Up 

For simplicity of presentation we introduce the procedure in the fully-sequential setting where 
the possible stopping times can be any positive integer n = 1,2,..., although formulations 
in other settings like group-sequential and truncated settings are possible with only minor 
modifications. Fix the number k > 2 of data streams and let k = {1, . . . , k}. Assume that 
there are k streams ([TJ of sequentially observable data and, for each i G k, it is desired to 
test the null hypothesis H^> versus the alternative hypothesis G^' about the parameter 9^ 1 ' 
governing the ith data stream X\ , JQ , . . ., where H® and G^ l > are disjoint subsets of the 
parameter space 0W containing 9^> . The individual parameters 9^ 1 ' may themselves be vectors, 
and the global parameter 9 = (9^' , . . . , 9^ k >) is the concatenation of the individual parameters 
and is contained in the global parameter space = 9" x • • • x 0W. 

Given 9 G 0, let T{9) = {i G k : 0® G H®} denote the indices of the "true" hypotheses 
and T{9) = {i £ k : 0® G G®} the indices of the "false" null hypotheses. The type I and II 
familywise error rates, denoted FWEj(0) and FWFjjj(9), are defined as 

FWE/(0) = P e (H^ rejected, any i G T{9)) 
FWF n {9) = P 9 (H® accepted, any * € T(9)). 

Here the notion of rejecting (resp. accepting) i?W is equivalent to accepting (resp. rejecting) 
G^K This definition of FWFr(9) i s the same as the standard one for fixed-sample testing 
(such as in lHochberg and Tamhand . Il987l ) and FWE //(#) is defined analo gous l y; the quantity 



1 — FWE//(0) has been called "familywise power" by some authors (e.g.. iLed . I2004J ) 



The building blocks of the the sequential Holm procedure defined below are k individual 
sequential test statistics {AW(n)}j e fc i n >i, where A"'(n) is the statistic for testing H® vs. 
G^' based on the data X± , X% , • • • , Xn available from the ith stream at time n. Concrete 
examples of these test statistics are given later in this section and in Section [3] but, for now, the 
reader may think of A^(n) as a sequential log likelihood ratio statistic for testing iJW vs. GrW, 
say. Given desired FWE/(0) and FWEj;(^) bounds a and /3 G (0,1), respectively, for each 
data stream i we assume the existence of critical values As = A s (a, j3) and B s l = B s l (a, (3), 
s £ k, such that 



P e(l) (A {i) (n) > Bf some n, A (i) (n) > Af all ri < n) < for all 9 {i) G H (i) (2) 

P eii) (A® (n) < A® some n, A^(n') < B® all ri < n) < —^- for all 0® G G {i) (3) 

for all i, s G k. We will show below that, in most cases, there are standard sequential statistics 
that satisfy these error bounds. Without loss of generality we assume that, for each iefe, 

Af <Af <...< Af < B® < B®_ x < . . . < B[ l) . (4) 



For example, if the A s l were not non-decreasing in s then they could be replaced by A£ = 
max{A\ , . . . , As } for which (|3j) would still hold; similarly for B s z and ([2]). Note that, by 
([2])-([3]), the critical values Af,Bi" are simply the critical values for the sequential test that 
samples until 

Af <A«(n)<sf ) (5) 

is violated, and this test has type I and II error probabilities a/k and f3/k, respectively. The 
values As , s € k, are then such that the similar sequential test with critical values A s , Bj 1 
has type II error probability f3/(k — s + 1), and the analogous statement holds for the test with 
critical values Af , B s l ' . 

The sequential multiple testing procedure introduced below will involve ranking the test 
statistics associated with different data streams, which may be on completely different scales 
in general, so for each stream i we introduce a standardizing function (p^'(-) which will be 
applied to the statistic A^(n) before ranking. The standardizing functions <p( 1 ' can be any 
increasing functions such that (p( l >(A s L ) and (p( l '(B s ) do not depend on i. For simplicity, here 
we take the ip^ 1 ' to be piecewise linear functions such that 

') = _(& -g + 1) and tp®(B®) = k-8 + l for all s £ k. (6) 



<P 



(0/ 



That is, for i £ fc define 



( 



^(x) 



-x + Af - k, for x < Af 

2( - x - A K ] - 1 for A (l) <x< B {i) 

§^L + fc - 5 + 1, for B« < x < B^ if B® x > Bi l \ Ks<k 



x-Bf+k, iovx>Bf. 

We shall describe the sequential Holm procedure in terms of stages of sampling, between 
which accept/reject decisions are made. Let Xj C k (j = 1,2,...) denote the index set of 
the active hypotheses (i.e., the B~W which have been neither accepted nor rejected yet) at the 
beginning of the jth stage of sampling, and rij will denote the cumulative sample size of any 
active test statistic up to and including the jth stage. The total number of null hypotheses 
that have been rejected (resp. accepted) at the beginning of the jth stage will be denoted by 
r,j (resp. dj). Accordingly, set X\ = k, no = 0, a\ = n = 0, let | • | denote set cardinality, and 
fix desired FWEi(0) and FWE//(#) bounds a and f3, respectively. 

2.2 The Sequential Holm Procedure 

The jth stage of sampling (j = 1,2,...) proceeds as follows: 

1. Sample the active streams {Xn }iex-, n >n j ^ 1 until n equals 

rij = inf < n > rij-i : A* +1 < Ay'(ri) < B r +1 is violated for some i € Xj > . (7) 

2. Standardize and order the active test statistics Iv l \nj) = ip^' (A^ (rij)) , i £ Xj, as follows: 

A^^irij) < A^^\ nj ) <...< A.W>™\ nj ), (8) 

where i(J, £) denotes the index of the £th ordered active standardized statistic at the end 
of stage j. 



3. (a) If the first inequality in ([7]) was violated for some i £ Tj, i.e., if AW(rij) < A^', 
then accept the rrij > 1 null hypotheses 

#(*C/>1)) jyWj.2)) H (.i(j,mj)) 



where 



m^ = mm 



[m > 1 : A^'' m+1 ))(n i ) > -(Jfe - aj - m)} , (9) 

and set cjj+i = aj + m,-. Otherwise set «j+i = a,j. 
(b) If the second inequality in (J7| was violated for some i £ Ij, i.e., if AW(rij) > -B) +1 , 





fl-(*W.|2il)) i fi-(*W.|2i|-l)) 


H (i(j,\ZA-™'j+l)) 


where 


m'j = min |m > 1 : A.^'^" 


~ m)) {nj)<k-rj- 


and set r J+ i = 


= tj + m'j. Otherwise set rj+% 


= r i- 



then reject the m'- > 1 null hypotheses 

(10) 

}, (11) 

4. Stop if there are no remaining active hypotheses, i.e., if Oj+i + ?"j+i = k. Otherwise, let 
Ij+i be the indices of the remaining active hypotheses and continue on to stage j + 1. 

Before giving an example of this procedure, we make some remarks about its definition. 

(A) There will never be a conflict between the acceptances in Step [3a] and the rejections in 
Step I3bl since if H^> is accepted at stage j then i = i(j,m) for some m < rrij, hence 
m — 1 < rrij so by © we have 

A^(rij) = W j ' m »(rij) < -(Jfe - aj - (m- 1)) < < k - Tj - {\Tj\ - m), 

which shows that the set in (jlip must contain the value \Ij\ — m, hence m'- < \Ij\ — m, 
orm< \Ij\ — m'j. This, with (|10p . shows that H^ % > = H^ ,m >' could not have also been 
rejected. A similar argument shows that a null hypothesis that is rejected could not also 
be accepted at the same stage. 

(B) If k = 1 then this definition becomes the sequential test © of the single null hypothesis 
]JW versus alternative G^' which has type I and II error probabilities bounded by a 
and /?, respectively. 

(C) Ties in ([8]) can be broken arbitrarily (at random, say) without affecting the error control 
proved in Theorem 12 .1\ below. 

(D) If the same critical values are used for all data streams, that is, if As = A s % = A s and 
Bg = Bs = B s for all i,i',s S k, then the standardization performed in Step [2] can 
be dispensed with as long as the values to the right of the inequalities in (J9| and (fTT|) 
are replaced by A aj+m+ i and B rj+m+ i, respectively. Error control still holds under these 
conditions, which we prove below as part of Theorem 12.11 

Before stating our main result, Theorem 12. 1\ that this procedure controls both type I 
and II family wise error rates, we give a simplistic example to show the mechanics of the 
procedure. Table [1] contains three sample paths in the setting of three pairs of null and 
alternative hypotheses about the probability pW of success in Bernoulli data X„ , i = 1,2,3. 
Here the test statistics A'*) (n) are taken to be log likelihood ratios 

n 

A»(n) = (2S« - n) log(.6/.4) where S$ = J^ xf\ (12) 



for testing 

H (i) . p (i) < .4 vs . G « . p (i) > >6) ( 13 ) 

i = 1,2,3, about the success probability pW of i.i.d. Bernoulli data. This choice of test 
statistic and calculation of the critical values given in the table's header will be explained in 
detail further below in Section 13.1.1) for now we merely focus on the procedure's decisions to 
stop or continue sampling. Per remark (}DJ) we dispense with the standardizing functions and 
drop the superscript (i) on the critical values A s , B s . The values of the stopped test statistics 
are given in bold in Table [TJ On sample path 1, sampling proceeds until time n\ = l when H^ 1 ' 
and H^ 2 > are rejected because this is the first time any of the 3 test statistics exceed B\ or fall 
below A\. In particular, H^- 1 ' is rejected because AS 1 ' {7) = 2.03 > B\ = 1.93 and H^ 2 ' is also 
rejected at this time because A^ 2 ^(7) = 2.03 > B2 = 1.53 and one null hypothesis (i.e., H^- 1 ') 
has already been rejected; the fact that A^ 2 )(7) also exceeds B\ was not necessary for rejecting 
H^ 2 > . Next, sampling of stream 3 is continued until time ri2 = 10 when H^> is accepted 
because its test statistic falls below A\ = —2.43. Similarly, on sample path 2, after rejecting 
H^ 1 ' at time n\ = 7, H^ 2 ' is then rejected at time n-2 = 8 because A^(8) exceeds B2 = 1.53 
and one null hypothesis (i.e., H^ 1 ') has already been rejected. H^ 3 ' is also accepted at time 
?i2 = 8 for the same reason as above. On sample path 3, all three null hypotheses are rejected 
at time m = 7 because A^(7) = 2.03 > Bx, A^ 2 )(7) = 2.03 > B 2 and one null hypothesis (i.e., 
H^ 1 ') has already been rejected, and A ( - 3 - ) (7) = 1.22 > B% and two null hypotheses (i.e., H^- 1 ' 
and H^ 2 >) have already been rejected. 

Next we state the result that the sequential Holm procedure control the familywise error 
rates, which is proved in the appendix. 

Theorem 2.1. Fix a, (3 £ (0, 1). If the test statistics A' l '(n), i <E k, n > I, and critical values 
As = As (a,/3) and B s l = Bs(a,(3), i,s € k, satisfy (J2J) -(J3j) , then the sequential Holm 
procedure defined above in Steps [7JR] satisfies FWEj(9) < a and FWEjj(6) < (3 for all 9 £ O. 
If Af = As = A s and B s % = B s % = B s for all i,i',s G k, then this conclusion still holds if 
we take ip^ 1 ' (x) = x for alii G k and replace the right-hand- sides of the inequalities in ([9]) and 
(HU) by A aj+m+1 and B rj+m+1 , respectively. 

3 Constructing Test Statistics that Satisfy (E3)-(BD f° r 
Individual Data Streams 

Since all that is needed in the above construction of the sequential Holm procedure are sequen- 
tial test statistics and critical values satisfying ©-Q for each data stream, in this section we 
show how to construct them in a few different settings and give some examples. 

3.1 Simple Hypotheses and Their Use as Surrogates for Cer- 
tain Composite Hypotheses 

In this section we show how to construct the test statistics A^ (n) and critical values {A s , Bg } s ek 
satisfying d2])-([3|) for any data stream i such that flw and G^ l > are both simple hypotheses. 
This setting is of interest in practice because many more complicated composite hypotheses 
can be reduced to simple hypotheses. In this case the test statistics A^'(n) will be taken to 
be log-likelihood ratios because of their strong optimal ity properties of the resulting sequen- 



tial probability ratio test (SPRT); see IChernoffl (jl972l ). In order to express the likelihood 



ratio tests in simple form, we now make the additional assumption that each data stream 



Table 1: Three sample paths of the sequential Holm procedure for k = 3 hypotheses about Bernoulli 
data using critical values A ± = -2.34, A 2 = -1.94, A 3 = -1.27, B ± = 1.93, B 2 = 1.53, B 3 = .86. 
The values of the stopped sequential statistics are in bold. 



Data 
Stream 



n 



10 













Sample 


Path 1 












1 


-A-n 





1 


1 


1 


1 


1 


1 








AW(n) 


-.41 


.00 


.41 


.81 


1.22 


1.62 


2.03 








2 


y( 2 ) 


1 





1 


1 


1 


1 


1 








A^{n) 


.41 


.00 


.41 


.81 


1.22 


1.62 


2.03 








3 


-A-n 





1 








1 

















A< 3 )(n) 


-.41 


.00 


-.41 


-.81 


-.41 


-.81 


-1.22 


-1.62 


-2.03 


-2.43 









k 


Sample 


Path 2 








1 





1 


1 


1 


1 


1 


1 




-.41 


.00 


.41 


.81 


1.22 


1.62 


2.03 




2 


1 








1 


1 


1 


1 


1 


.41 


.00 


-.41 


.00 


.41 


.81 


1.22 


1.62 


3 





1 




















-.41 


.00 


-.41 


-.81 


-1.22 


-1.62 


-2.03 


-2.43 









Sample 


Path 3 






1 


1 





1 


1 


1 


1 


1 


.41 


.00 


.41 


.81 


1.22 


1.62 


2.03 


2 


1 


1 


1 





1 


1 


1 


.41 


.81 


1.22 


.81 


1.22 


1.62 


2.03 


3 





1 





1 


1 


1 


1 


-.41 


.00 


-.41 


.00 


.41 


.81 


1.22 



X^ ,X% , ... constitutes independent and identically distributed data. However, we stress 
that this independence assumption is limited to within each stream so that, for example, 
elements of X\ 1 ' , JQ , - - ■ may be correlated with (or even identical to) elements of another 
stream X^ ,X^ , . . .. We represent the simple null and alternative hypotheses H® and G^' 
by the corresponding distinct density functions hs 1 ' (null) and g^' (alternative) with respect 
to some common cr-finite measure [A^K Formally, the parameter space ©W corresponding to 
this data stream is the set of all densities / with respect to fi^ 1 ' , and H^ l > is considered true if 
the true density /W satisfies /W = h® ^W-a.s., and is false if /W = g® /i^-a.s. The SPRT 
for testing H® : /W = h® vs. G® : f® = gW with type I and II error probabilities a and /3, 
respectively, utilizes the simple log-likelihood ratio test statistic 

^Hn) = ±io g ( 9 -^l) (14) 

and samples sequentially until A"'(n) < A(a, j3) or AW(n) > B(a, /3), where the critical values 
A(a,(3) and B(a,j3) satisfy 

P h(l) (A® (n) > B(a, /?) some n, A (i) (n ; ) > A(a, /?) all n' < n) < a (15) 

P gW (A (i) (n) < A(a, 0) some n, A w (n) < B(a, /?) all ri < n) < /3. (16) 

There are a few different options for computing A(a,/3) and B(a,/3) in practice. They may 
be computed numerically via Monte Carlo or normal approximation to the log-likelihood ra- 
tio (|14p . but the most widely-used method is to use the simple, closed-form W aid- approximations 

A(a,/?)=log(-^-), B(aJ)= log (^-Y (17) 



See lrloel et al.1 ()197ll . Section 3.3.1) or lSiegmundl (J1985I ) for a derivation. Although, in general 



the i nequalities i n (I15l) - (|16p only hold approximately when A(a,/3) and B(a,/3) are given by 



(|17p , iHoel et al.l ()197ll ) show that the actual type I and II error probabilities when using (|17p 



can only exceed a or j3 by a negligibly small amount in the worst case, and the difference 
approaches for small a and /?, which is relevant in the present multiple testing situation 
where we will utilize Bonferroni-type cutdowns of a and /3. In what follows in this section 
we adopt (fT7|) and use these to construct the critical values As , Bg of the sequential Holm 



procedure. The extensive simulations performed in Section 0] show that this does not lead to 
any exceedances of the desired familywise error rate bounds. Alternative approaches would be 
to compute {A s , B s l } s ek via Monte Carlo, as mentioned ab ove, or to r e place (fT7l) by log/3 



and loga -1 , respectively, for which (fT5"j) - (fl~6j) always hold (see lHoel et al.l . ll97ll ) and proceed 
similarly, but we do not explore those options here. 

The next theorem shows that, neglecting Wald's approximation, the following simple ex- 
pressions (|18p can be used for the critical values in the sequential Holm procedure. Specifically, 
we show that the left-hand-sides of ([2])- ([3]) are equal to the right-hand-sides of (I19p -(|20p. and 
hence the inequalities in ([2])-([3|) hold, up to Wald's approximation. 

Theorem 3.1. Suppose that, for a certain data stream i, H® : /W = hS 1 ' and G^ % > : /W = gW 

are simple hypotheses. Let (*walA a >P) an ^ ^Wald^ a ^) ^ e ^ e va ^ ues °f the probabilities on 
the left-hand-sides of (I15|) and (|16p . respectively, with A^'(n) given by (|14p and A(a, j3) and 
B{a,f3) given by the Wald approximations (|17p . Now fix a, (3 £ (0,1) and for s € k let 

(k-s + l-(3)a (k-s + l-a)P 

a s = a s (a, j3) = — — — , j3 s = /3 s (a,/3) - 



{k-s + l){k-py ra " sv '"' (k-s + l){k-a) 



.,(') 



iW 



Also, let Oin i {s) and P)J i m {s) denote the left-hand- sides of ([2]) and 
As, Bg given by 

A«=A«(a,/3)=log 
Then, for all s € k, 



13 



(1 - a s )(k - s + 1) 



B®=B® (a, (3)= log 



\, respectively, with 
(1 -&)(*; -a + 1 



o 



<\ 



PHolm\ S ) 



(») 
l(0 



§*»(») = «^««(«/(* -« + !)» A) 



and 



(18) 

(19) 

^aid(a»^/(fc-a + l)) (20) 

and therefore (|2J)-(|3J) /io/d ; up to PFaWs approximation, when using the critical values (|18j) . 

The theorem gives simple, closed form critical values fll8|) that can be used in lieu of Monte 
Carlo or other methods of calculating the Ik critical values {A s , B s } s gfc for a stream i whose 
hypotheses H®,G® are simple. Example values of (|18j) for a = .05 and /3 = .2 are given in 
Table[2]for fe = 2, . . . , 10. 

Table 2: Critical values fll8p of the sequential Holm procedure for simple hypotheses, for a = 
(3 = .2, and k = 2, . . . , 10 to two decimal places. 



.05, 



k 










A 1 ,.. 
Si,.. 


-,A k 
■ , B k 








2 


-2.28 
3.58 


-1.59 
2.89 
















3 


-2.69 
4.03 


-2.29 
3.62 


-1.60 
2.93 














4 


-2.98 
4.33 


-2.70 
4.04 


-2.29 
3.64 


-1.60 

2.95 












5 


-3.21 

4.56 


-2.99 

4.34 


-2.70 
4.05 


-2.29 
3.65 


-1.60 
2.96 










6 


-3.39 

4.75 


-3.21 

4.57 


-2.99 
4.35 


-2.70 
4.06 


-2.29 
3.66 


-1.60 
2.96 








7 


-3.55 

4.91 


-3.39 

4.76 


-3.21 

4.58 


-2.99 
4.35 


-2.70 
4.07 


-2.30 
3.66 


-1.60 
2.97 






8 


-3.68 
5.05 


-3.55 

4.92 


-3.39 

4.76 


-3.21 

4.58 


-2.99 
4.36 


-2.70 
4.07 


-2.30 
3.66 


-1.60 
2.97 




9 


-3.80 
5.17 


-3.68 
5.05 


-3.55 

4.92 


-3.40 

4.77 


-3.21 

4.58 


-2.99 
4.36 


-2.70 
4.07 


-2.30 
3.67 


-1.60 
2.97 


10 


-3.91 

5.28 


-3.80 
5.17 


-3.68 
5.05 


-3.55 

4.92 


-3.40 

4.77 


-3.21 
4.59 


-2.99 
4.36 


-2.70 
4.07 


-2.30 -1.61 
3.67 2.98 



3.1.1 Example: Exponential families 



Suppose that a certain data stream i is comprised of i.i.d. d-dimensional random vectors 
from a multiparameter exponential family of densities 



y(') y(i) 



X^ ~ fao(x) = exp[^ T x - J,® (9®)], n = 1,2, 



(21) 



where 9® and x are <i-vectors, (■) T denotes transpose, ip : M. d — >• R is the cumulant generating 
function, and it is desired to test 

H® : 0« = 77 vs. G W : W = 7 
for given 77,7 € M rf . Letting Sn = 2?=i-^Q > the log-likelihood ratio (|14p in this case is 

A«(n) = (7 - V ) T S® - n[V W ( 7 ) - </>%)] (22) 

and, by Theorem 13 .1\ the critical values (fTHl) can be used, which satisfy d2])-([3]) up to Wald's 
approximation. 

As mentioned above, many more complicated testing situations reduce to this setting. For 
example, the Bernoulli example (fT2|) for testing (fT3j) can be reduced to testing p® = .4 vs. 
p® = .6 by considering the worst-case error probabilities under the hypotheses (|13p . hence (112p 
is given by ||22J with 0® = log[p®/(l -pW)], ij)®(0®) = -log(l-pW), 77 = log(.4/.6) = -7, 
and the critical values in Tabled] are given by (118p with a = /3 = .25, this value chosen merely 
to produce short sample paths for the sake of the example. 

3.2 Other Composite Hypotheses 

While many composite hypotheses can be reduced to the simple-vs. -simple situation in Sec- 
tion [3Tj the generality of Theorem 12.11 does not require this and allows any type of hypotheses 
to be tested as long as the corresponding sequential statistics satisfy ([I])-©. In this section 
we discuss the more general case of how to proceed to apply Theorem 12.11 when a certain data 
stream i is described by a multiparameter exponential family (|2ip but simple hypotheses are 
not appropriate. Let 

1(0®, X®) = {9® - A w ) r V^ w (0») - [^ w (0 (i) ) -^ w (A (i) )] 



denote the Kullback-Leibler information number for the distribution (12ip . and it is desired to 
test 

H® : u{6®) < u vs. G® : u{6®) > Ul (23) 

where u(-) is a continuously differentiable real- valued function such that 

for all fixed 0(0, /(flW, A«) is f decreasin S ) m U ( A «) f < "\ „((?») 
v ' y increasing y v V > / 

and uo < Ui are chosen real numbers. The family of models (|2ip and general form of the hy- 
potheses (f23|) contain a large number of situations frequently encountered in practice, including 
various two- population test s that occur frequently in randomized Phase II and Phase III clini- 
cal trials; see lBartroff et al.l ( 20121 . Chapter 4). Of course there are many composite hypotheses 



encountered in practice which do not fit into the form (J23H . such as 

H® : 00 = 0® vs. G® : 9 ^ ef (24) 

for some fixed 0q . However, by considering true values of 9® arbitrarily close to 9q , it is 
clear that no test of (|24j) can control the type II error probability for all 0® £ G® in general, 
and since the focus here is on tests that control both the type I and II familywise error rates, 
one would need to restrict G® in some way for that to be possible, for example by modifying 
G® to be only the 9® such that \\9® — 9$ || 2 > 5 for some 5 > 0. But this restricted form 
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fits into the framework (|23p by choosing u(6^') = \\9^' — 9q \\, uq = 0, and u\ = S. So 
although it is not natural to test (I24p in the current framework of simultaneous type I and II 
familywise error control, it is natural to test (|24p when only type I familywise error control 
is strictly require d, and that problem ha s already been addressed in the sequential multiple 
testing setting by iBartroff and Lail (J201Q ). 

The hypotheses (|23p can be tested with sequential generalized likelihood ratio (GLR) statis- 
tics, as follows. Letting 



gw = (vv w r 



i 



)> 






(0 



denote the maximum likelihood estimate (MLE) of 6 based on the data from the first n 
observations, define 



Afl-(n) = n 
A G (n) = n 



inf I®$,\) 

A: u(X)=uo 

inf /(%<), A ) 

A: m(A)=ui 



Jv^Al 



c2(0^ 



a(*)c \_ JV ^^H{n), if u(0A ) > ^o and Ajy(n) > A G (n) 
1 — Y / 2nAc(n), otherwise. 



(25) 
(26) 

(27) 



The statistics (|25p and (|26|) are the log-GLR statistics for testing against iJW an d against 
G^' l \ respectively, whose signed roots in (j27|) hav e standard normal large - ra lim iting distribu- 
tion under u(6^ 1 ') = uq and u\, respectively; see Ijennison and Turnbulll ( 19971 . Theorem 2), 
whose results further show that under group sequential sampling, the signed-root statistics 
have asymptotically independent increments, a fact which can be used with random walk 
theory to find the critical values {A s l 



,(*)- 



Bartroff et all 12012 . Chap 



,B^} sek for A»(n) (see 
ter 4). However, our simulation studies have shown that under the fully-sequential sampling 
considered here, the small-n behavior of these statistics can deviate substantially from the 
standard normal random walk and therefore we advocate Monte Carlo determination of the 
critical values {A§ , Bg } s ek f° r A^^(n), which then allows their inclusion in the sequential 
Holm procedure. 



4 Simulation Studies 



In this sectio n, we compare the sequential Holm procedure (denoted SH) with the fixed- 
sample Holm (|1979i ) procedure (denoted FH), the sequential Bonferr oni procedure ( denote d 
SB), and the sequential intersection scheme (denoted IS) proposed bv lDe and Baronl (J2012bl ). 
The SB procedure uses a SPRT on each data stream with error probability bounds a/k and 
f3/k via the Wald approximations (|17p . That is, for each i € k, SB samples the ith stream 
until (P is violated, with Af = ]og[(fi/k)/(l - a/k)} and Bf = log[(l - f3/k)/{a/k)}. The 
three sequential procedures SH, SB, and IS are the only ones we know of that control both 
FWE; and FWE/;. In our studies we have chosen the commonly used values of a = .05 and 
j3 = .2, i.e., familywise power at least 80%. This same value of a is used for the fixed-sample 
Holm procedure as well, which does not guarantee FWE77 control at a prescribed level, so we 
have chosen its sample size to make its familywise power approximately the same as that of 
the SH procedure in order to have a meaningful comparison with the sequential procedures. 
Below we present two sets of simulations, the first in Table [3] with independent streams of 
Bernoulli data, and the second in Table H] with dependent streams of normal data generated 
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from a multivariate normal distribution with non-identity covariance matrix. For each scenario 
considered below, FWE/, FWE//, expected total sample size EN = E(Y^ =1 N^) of all the 
data streams where N^> is the total sample size of the ith stream, and relative savings in 
sample size of SH are estimated as the result of 100,000 Monte Carlo simulated batteries of k 
sequential tests. In each set of simulations, the data streams and hypotheses tested are similar 
for each data stream; we emphasize that this is only for the sake of getting a clear picture of 
the procedures' performance, and this uniformity is not required in order to be able to use and 
of the procedures considered. 



4.1 Setting 1: Independent Bernoulli Data Streams 

Table [3] contains the operating characteristics of the above procedures for testing k hypotheses 
of the form 

H (i) ■ p (i) < .4 vs . Q® . p (0 > .6, i = l,...,k, 

about the probability p^> of success in the ith stream of i.i.d. Bernoulli data; additionally, the 
streams were generated independently of each other. The individual test statistics (|12p were 
used and the SH procedure used the critical values in Table as described in Section 13.11 
The data was generated for each data stream with pw = .4 or .6 and the second column of 
Table [3] gives the number of hypotheses for which pW = .4. The columns labeled Savings give 
the percent decrease in expected sample size EN of the SH relative to each other procedure. 
The SH procedure has substantially smaller sample size compared to the other three, saving 
more than 50% compared to FH and IS in each scenario with k > 5. Like its fixed-sample 
analog, the SB procedure is conservative in that its attained error rates FWE/ and FWE// are 
much smaller than the prescribed levels, and the IS also suffers from this, perhaps as a result 
of its stringent stopping rule and resulting large average sample size. In fact, the IS has 
FWE/ and FWE// even smaller than the SB procedure. The FH procedure handles its error 
rate more judiciously than the SB and IS procedures due to its step-down structure, and the 
SH procedure has very similar error rates to FH but with much smaller expected sample sizes. 



Table 3: Operating characteristics of sequential and fixed-sample multiple testing procedures for k 
streams of independent Bernoulli data. 







SH 


FH 


SB 


IS 


k 


# of true H<-' } 


FWE, 


FWE,, 


EN 


FWE, 


FWE,, 


EN 


Saving's 


FWE, 


FWE,, 


EN 


Savings 


FWE, 


FWE,, 


EN 


Savings 


1 


1 



0.048 


0.190 


17.5 
24.6 


_ 


0.194 


42 


41.5% 


0.048 


0.190 


17.5 
24.6 


0.0% 
0.0% 


0.031 


0.194 


18.1 
28.6 


3.8% 
14.1% 


2 


2 
1 



0.045 
0.029 


0.135 
0.165 


47.6 
63.0 

72.7 


0.029 


0.135 
0.168 


126 
126 


50.0% 
42.3% 


0.048 
0.025 


0.086 
0.161 


56.3 
66.7 

77.1 


15.4% 
5.6% 
5.8% 


0.025 
0.021 


0.120 
0.091 


64.7 

97.2 
104.0 


26.4% 
35.2% 
30.1% 


5 


3 

2 


0.034 
0.028 


0.105 
0.127 


216.7 
230.7 


0.039 
0.033 


0.108 
0.128 


485 
490 


55.3% 
52.9% 


0.022 
0.015 


0.077 
0.112 


230.2 
247.1 


5.9% 
6.6% 


0.010 
0.012 


0.044 
0.044 


474.9 
439.9 


54.4% 
47.6% 


10 


8 
5 
2 


0.034 
0.027 
0.016 


0.070 
0.111 
0.130 


479.9 
549.6 
579.4 


0.029 
0.045 
0.033 


0.075 
0.112 
0.132 


1200 
1240 

1180 


60.0% 
55.7% 
50.9% 


0.027 
0.017 
0.007 


0.034 
0.085 
0.129 


532.6 

587.1 
642.8 


9.9% 
6.4% 
9.9% 


0.011 
0.008 
0.006 


0.026 
0.023 
0.022 


1144.8 
1295.0 
1298.2 


58.1% 
57.6% 
55.4% 


20 


16 
10 
4 


0.035 
0.027 
0.017 


0.067 
0.108 
0.137 


1129.8 
1273.2 
1332.6 


0.047 
0.045 
0.035 


0.072 
0.113 
0.138 


2860 
3040 

2740 


60.5% 
58.1% 
51.4% 


0.035 
0.022 
0.009 


0.029 
0.073 
0.116 


1250.8 
1336.5 
1421.9 


9.7% 
4.7% 
6.3% 


0.006 
0.004 
0.004 


0.011 
0.013 

0.015 


3095.6 
3406.0 
3344.1 


63.5% 
62.6% 
60.2% 
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4.2 Setting 2: Correlated Normal Data Streams 

Table H] contains the operating characteristics of the four procedures described above for testing 
k hypotheses of the form 



H (i) . 00 < o vs. G® : # (i) > 5, 



1, . . . , K, 



for known 5 > 0, taken here to be 1, about the mean 0W of i.i.d. normal observations with 
known variance 1, which makes up the ith data stream. To investigate the performance of the 
procedures under dependent data streams, the streams were generated from a /c-dimensional 
multivariate normal distribution with mean = (#' ', . . . ,$( '), given in the third column of 
Table [H and four different non-identity covariance matrices Mi , M2 , M3 , and M4 , given in the 
Appendix, which were chosen to give a variety of different scenarios of positively and negatively 
correlated data streams. The interaction of these various combinations of correlations with true 
or false null hypotheses all show somewhat similar behavior to the case of independent data 
streams in the previous section, in that the SH procedure has substantially smaller expected 
sample size than the other three procedures in all cases, more than a 30% reduction in most 
cases, and that the SH procedure has FWE/ and FWE/j much closer to the prescribed values 
a and /3 values than the other two sequential procedures SB and IS, and similar to the FH 
procedure in most cases. Because the SH procedure causes more early stopping, it is interesting 
to note that its error control is less conservative than the other sequential procedures even in 
cases when data streams with true null hypotheses are positively correlated with streams 
having false null hypotheses, such as the second case of the Mi-generated data and the third 
case of the M3-generated data 

Table 4: Operating characteristics of sequential and fixed-sample multiple testing procedures for k 
streams of correlated Normal data. 









SH 


FH 


SB 


IS 


Covariance 


k 


true 9 


FWE/ 


FWE 7/ 


EN 


FWE 7 


FWEjj 


EN 


Savings 


FWE 7 


FWEjj 


EN 


Savings 


FWE 7 


FWE„ 


EN 


Saving's 






(1.1) 


0.024 


- 


10.4 


- 


- 


- 


- 


0.025 


- 


11.6 


10.0% 


0.009 


- 


11.7 


11.3% 


M, 


2 


(1.0) 


0.029 


0.110 


12.8 


0.050 


0.113 


20 


35.9% 


0.015 


0.057 


13.6 


5.7% 


0.027 


0.108 


19.5 


34.3% 






(0,0) 


- 


0.087 


14.3 


- 


0.102 


20 


28.5% 


- 


0.086 


15.6 


8.5% 


- 


0.046 


16.7 


14.4% 






(1,1) 


0.029 


- 


10.2 


- 


- 


- 


- 


0.030 


- 


11.6 


11.9% 


0.025 


- 


15.1 


32.5% 


M 2 


2 


(1,0) 


0.015 


0.063 


13.5 


0.037 


0.082 


22 


38.8% 


0.015 


0.057 


13.6 


0.8% 


0.009 


0.016 


17.9 


24.8% 






(0,0) 


- 


0.114 


14.4 


- 


0.128 


20 


28.1% 


- 


0.113 


15.6 


8.0% 


- 


0.103 


20.3 


29.3% 






(1,1,1,1) 


0.024 


- 


24.6 


- 


- 


- 


- 


0.025 


- 


29.1 


15.6% 


0.007 


- 


38.8 


36.7% 


M 3 


4 


(1,1,0,0) 


0.013 


0.051 


32.4 


0.032 


0.058 


60 


46.0% 


0.0011 


0.044 


33.8 


4.0% 


0.001 


0.003 


46.9 


30.9% 


(1,0,1,0) 


0.020 


0.080 


32.2 


0.043 


0.102 


56 


42.5% 


0.015 


0.058 


33.8 


4.7% 


0.010 


0.021 


59.6 


46.0% 






(0,0,0,0) 


- 


0.089 


34.1 


- 


0.115 


48 


29.1% 


- 


0.090 


38.4 


11.4% 


- 


0.029 


52.6 


35.3% 






(1,1,1,1,1,1) 


0.022 


- 


40.7 


- 


- 


- 


- 


0.022 


- 


48.8 


16.5% 


0.002 


- 


67.1 


39.2% 






(1,1,1,1,1,0) 


0.021 


0.032 


46.3 


0.038 


0.032 


108 


57.2% 


0.020 


0.019 


51.2 


9.6% 


0.003 


0.002 


85.2 


45.7% 






(1.1.1.1.0.0) 


0.018 


0.041 


50.4 


0.038 


0.041 


108 


53.3% 


0.016 


0.031 


53.7 


6.1% 


0.007 


0.001 


90.0 


44.0% 






(1.1.0.1.1.0) 


0.018 


0.068 


53.4 


0.039 


0.073 


102 


47.7% 


0.013 


0.049 


56.1 


4.9% 


0.004 


0.014 


102.3 


47.8% 


M 4 


6 


(1.1.1.0.0.0) 


0.012 


0.047 


53.6 


0.030 


0.058 


102 


47.5% 


0.012 


0.041 


56.2 


4.7% 


< 0.001 


< 0.001 


83.0 


35.4% 






(1,1,0,0,0,0) 


0.011 


0.072 


55.5 


0.031 


0.089 


96 


42.2% 


0.008 


0.061 


58.6 


5.3% 


< 0.001 


0.022 


97.8 


43.3% 






(1,0,0,1,0,0) 


0.016 


0.074 


55.3 


0.045 


0.092 


96 


42.4% 


0.010 


0.059 


58.7 


5.7% 


0.005 


0.008 


104.1 


46.9% 






(1,0,0,0,0,0) 


0.008 


0.077 


56.3 


0.040 


0.099 


00 


37.5% 


0.005 


0.071 


61.1 


7.9% 


0.001 


0.011 


97.8 


42.4% 






(0,0,0,0,0,0) 


- 


0.082 


55.7 


- 


0.087 


84 


33.7% 


- 


0.082 


63.6 


12.5% 


- 


0.009 


89.4 


37.8% 



5 Discussion 

The sequential Holm procedure proposed herein is a general method for combining individual 
sequential tests into a sequential multiple hypothesis testing procedure which controls both the 
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type I and II familywise error rates at prescribed levels without requiring the statistician to 
have any knowledge or model of the data streams' correlation structure, a desirable property 
that it inherits from Holm's fixed-sample procedure. In our simulations in Section U the 
sequential Holm procedure exhibits much more efficiency in terms of smaller average total 
sample size than existing sequential procedures, as well as Holm's fixed-sample test. In terms 
of achieved familywise error rates, our simulations suggest that the sequential Holm procedure 
occupies a "middle ground" between between existing sequential procedures, which have very 
conservative error rates and large average sample sizes, and the fixed-sample Holm test which 
achieves error rates closest to the prescribed values of all the procedures considered, but has 
still larger sample size and lacks the flexibility and adaptive nature of the sequential procedures. 
We summarize our recommendations for using the sequential Holm procedure in practice 
as follows. 

• For data streams whose hypotheses are simple, or are composite but can be reduced to 
considering simple hypotheses, we recommend using the sequential log-likelihood ratio 
statistic (PT4l) with the closed- form critical values (fT8l). 



• For data streams with composite hypotheses of the form (|23p . we recommend using the 
sequential generalized likelihood ratio statistic (|27l) and determining the critical values 
{A % s ,B l s } sG h to satisfy ©-([3]) by Monte Carlo. For group-sequential sampling with mod- 
erate group size the critical values can be determined by normal approximation. 

• Data streams with still other forms of hypotheses or test statistics (e.g., nonparamet- 
ric) can be included in the sequential Holm procedure by determining critical values 
{A l s ,B l s } se k satisfying (|2])-([3|) by Monte Carlo or other methods. 

As mentioned in the introduction, this subject, which lays at the intersection of sequential 
analysis and multiple testing, is quite young and therefore still has many interesting and fun- 
damental unanswered questions surrounding it. These include optimality theory for sequential 
multiple testing procedures, as well as calculations or estimates of their operating character- 
istics such as achieved familywise error rates and expected total and streamwise-maximum 
sample size. 

Appendix: Proofs and Details of Simulation Studies 

Proofs of theorems 

Proof of Theorem 12. 1L We fix 9 and, for simplicity, omit it from the notation that follows. 
We first prove that FWE //< /3. Since each </>^(') ^ s strictly increasing and satisfies ([6]), 

A (i) (n) < A® & A« (n) < -(k - s + 1) (28) 

A (i) (n) > B® & A® (n) > k - s + 1 (29) 

for any i, s & k. If T = then FWE/;= 0, so assume that T ^ 0. Let 

Sj = Im E k : i(j, m) GljHT and W j ' m)) { nj ) < -{k - aj - m + 1)} 

and let j* denote the earliest stage at which a type II error occurs, taking the value oo if 
no such error occurs; to prove that FWE/;< (3 we thus assume without loss of generality 
that j* < oo with probability 1. By our assumptions and by definition of j*, Sj* ^ so let 
m* = min Sj*. By partitioning T we have 

\T\ = \TC\Tj*\ + \{i G T : H® rejected at some stage 1, . . . , j* - 1}|. (30) 
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By definition of m* , the first term on the right-hand-side of (|30p is bounded above by 

\Ij* | — (m* — 1) = (k — cij* — rj*) — {in* — 1), 

and the second term on the right-hand-side of (|30p is bounded above by rj* . Combining these 
two gives \F\ < k — cij* — in* + 1, or 

aj *+m* <k-\T\ + l. (31) 

Letting Vi = {H^> accepted and i = i(j*,m*)}, we thus have 

Vi C iW{n r ) < -(k - a r - m* + 1)| 
= {A^(n r )<A^ +m ,} (bydMD) 

C {A»( % *) < Af_ m+1 } (by dSD). (32) 

We will also show that 

V C JA«(n) <#P for all n < n f } . (33) 

This holds because if A^^(n) > 1?} for some n < rij*, then ijW would be rejected at some 
stage prior to j*. To see this, let Wi be the event on the right-hand-side of f)33[) . It is clear from 
Step [3b] that on W£, some hypothesis would be rejected at a stage j < j* since i?} > B r +1 

for any value of rj. Let j' < j* denote the earliest stage such that H^ l > is not rejected before 
stage j' and 

A {i) (n f ) >5{°. (34) 

Let 77i be such that i = i(j' , \Ij,\ — m). We will show that 



p(3'',|I/[-<))( Wi ,)> k-ry -£ for all l<i< 



in 



which, by (jlll) . implies that H^ % > is rejected at stage j' and finishes the proof of (j33|) . For any 

1 <^ <m, 

p'-IIfl-fl)^,) > A^M"™)) („..,) > fc (by ® and flSD) 

>k- Ty - L 

Combining (|32p and (|33|) we have 

VS C JA«(n) < ^l|^|+i some n ' A(i) (n') < Bf all n' < n\ , 
and using this we have 



FWE;; = P i (J Vi J < ^ P(^) 



< J^P(AW(n) < ^1|^| +1 some n, A^(n') < B^ all ri < n) 
<J>/|.F| (by©) 

= /3. 
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The proof that FWE;< a is similar so the details are ommitted. The only thing that could 
make the situation different is the possibility that a hypothesis that would have been rejected 
in Step [3b] is accepted in Step [3a] However, Remark (jA]) guarantees that this does not happen. 

To prove the second claim of the theorem, we note that the only properties of the stan- 
dardizing functions ip^ 1 ' needed for the proof above are: 

1. for all i € k, f>^'{-) is strictly increasing; 

2. <p®(A$) = ^(Ap) and <p®(BJP) = (p^'\BP) for all i,i',se k; 

3. the right-hand-sides of the inequalities in ([9]) and (fTTj) equal 93 W (A aj+m+ i) and (p^ (B rj+m+ i), 
respectively. 



If A 



(i) 



aP 



A* and B 



(0 



B 



(O 



B s for all i,i',s £ k, then we can instead use 
tpW(x) = x for all i € k which preserves the first two properties in this case, and replacing 
the right-hand-sides of ([9]) and (fTTI) by A a . +m+ i and B r . +m+ i, respectively, satisfies the third 
property. □ 



Proof of Theorem 13.11 First note that a s ,fi s G (0, 1) for all s € k since 

k — s + 1 — (3 a a I 



< a x 



k-s + 1 



k-p 



< 1 



< 



k-j3 k-1 



< 1 



as k > 2, and similarly for /3 S . As and Bf' in (|18p can be written as A(a s ,/3/(k — s + 1)) 
and B(a/(k — s + 1),/3 S ), respectively, and it is simple algebra to then check that A(a/(k — 
s + 1),/3 S ) = A^ } for any s £ k. Then, to verify (fT9j) . 



o 



(0 

Holm 



(«) = P fcW (AW(n)>S« 



some n 



,A«(n') >4 all n'<n) 



P hW (A (i) (n) > B(a/(fc - s + !),&) some n, A«(n') > A(a/(fc - s + !),&) all n' < n) 



Q: 



(0 



(a/(fe-s + l),/3 s ) 



by (fT5|) . The proof of (f20j) is similar. 



D 



Details of Simulation Studies 

The four covariance matrices used in the simulations for Section 14.21 are as the following 



Mi 

M 2 



M, 



1 
0.8 

1 
-0.8 

1 

0.8 

-0.6 

-0.8 

1 

0.8 

0.6 

-0.4 

-0.6 

-0.8 



0.8 
1 



-0.8 
1 

0.8 

1 

-0.6 

-0.8 

0.8 
1 

0.8 
-0.4 
-0.6 
-0.8 



-0.6 

-0.6 

1 

0.8 

0.6 

0.8 

1 

-0.4 

-0.6 

-0.8 



-0.8 \ 
-0.8 
0.8 
1 



-0.4 

-0.4 

-0.4 

1 

0.8 
0.6 



-0.6 
-0.6 
-0.6 

0.8 
1 

0.8 



-0.8 \ 


-0.8 


-0.8 


0.6 


0.8 


1 / 
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